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Overview 

This  project  is  investigating  programming  language  technology — program 
analysis  and  program  rewriting— for  defending  software  systems  against  at¬ 
tacks  from  mobile  code  and  system  extensions.  The  approach  promises  to 
support  a  wide  range  of  flexible,  fine-grained  access-control  and  information- 
flow  policies.  Only  a  small  trusted  computing  base  seems  to  be  required. 
And  the  run-time  costs  of  enforcement  should  be  low. 

Our  progress  over  the  past  year  is  summarized  below.  Details  can  be 
found  in  the  publications  whose  citations  are  given  following  all  the  sum¬ 
maries.  A  list  of  DoD  interactions  and  technology  transitions  appears  at  the 
end  of  the  report. 

In-lined  Reference  Monitors 

This  past  year,  working  with  Ph.D.  student  Kevin  Harnlen,  Morrisett  and 
Schneider  developed  a  more  refined  characterization  of  what  policies  can  be 
enforced  using  reference  monitors.  This  new  work  extends  earlier  work  by 
Schneider,  now  taking  into  account  the  limits  of  computability.  Specifically, 
we  developed  a  model  based  on  standard  Turing  machines,  adapted  Schnei¬ 
der’s  criteria  for  enforceable  security  policies,  and  introduced  computability 
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requirements.  We  also  integrated  static  analysis  and  program  rewriting  into 
the  model. 

By  providing  this  unifying  model,  and  by  basing  it  on  Turing  machines, 
we  were  able  to  compare  the  relative  power  of  the  various  enforcement  mech¬ 
anisms,  and  to  relate  them  to  standard  computability  results.  For  instance, 
it  was  relatively  easy  to  show  that  the  class  of  policies  precisely  supported 
by  static  analysis  could  also  be  supported  by  both  reference  monitors  and  by 
program  rewriting.  In  addition,  we  found  that  introducing  a  computability 
requirement  on  reference  monitors  was  necessary,  but  not  sufficient,  for  pre¬ 
cise  characterization  of  the  class  of  policies  actually  realizable  by  reference 
monitors.  And  we  identified  a  new  property,  which  we  call  “punctuality” 
that  provides  a  more  accurate  upper  bound  on  the  power  of  reference  mon¬ 
itors. 

Our  most  surprising  and  important  results  involve  program  rewriting. 
We  can  show  that  the  class  of  policies  originally  characterized  by  Schneider 
does  not  include  all  policies  enforceable  through  rewriting  (and  vice  versa). 
Indeed,  we  were  able  to  show  that  the  class  of  policies  enforceable  through 
rewriting  does  not  correspond  to  any  class  of  the  Kleene  hierarchy.  This 
is  a  surprising  and  important  result,  as  it  shows  that  rewriting  truly  is  a 
powerful  security  enforcement  technique. 

Progress  on  Prototype  IRM.  Last  year,  we  developed  a  prototype  IRM 
rewriter  for  the  Microsoft  CIL,  which  takes  a  limited  class  of  policies  written 
in  a  very  primitive  specification  language.  In  essence,  the  policy  writer  could 
only  specify  that  certain  (non-virtual)  method  calls  should  be  replaced  with 
alternative  method  calls.  Though  limited,  we  showed  that  this  tool  could 
be  used  to  effectively  enforce  practical  policies. 

This  year,  we  have  extended  the  rewriting  tools  so  that  we  can  perform 
arbitrary  rewriting  on  the  CIL  code.  This  was  accomplished  by  building  on  a 
bytecode-rewriting  toolkit  developed  by  Microsoft  Researchers.  In  Fall  2002, 
Kevin  Hamlen  and  Greg  Morrisett  visited  Microsoft  Research  in  Cambridge 
to  further  develop  the  APIs  and  code  for  doing  this  manipulation. 


Cyclone  Compiler 

Today,  our  computing  and  communications  infrastructure  is  built  using  un¬ 
safe,  error-prone  languages  such  as  C  or  C++  where  buffer  overruns,  for¬ 
mat  string  errors,  and  space  leaks  are  not  only  possible,  but  frighteningly 
common.  In  contrast,  type-safe  languages,  such  as  Java,  Scheme,  and  ML, 
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ensure  that  such  errors  either  cannot  happen  (through  static  type-checking 
and  automatic  memory  management)  or  at  least  are  caught  at  the  point  of 
failure  (through  dynamic  type  and  bound  checks).  This  fail-stop  guarantee 
is  not  a  total  solution,  but  it  does  isolate  the  effects  of  failures,  facilitates 
testing  and  determination  of  the  true  source  of  failures,  and  it  enables  tools 
and  methodologies  for  achieving  greater  levels  of  assurance. 

The  obvious  question  is:  “Why  don’t  we  re-code  our  infrastructure  using 
type-safe  languages?”  Though  such  a  technical  solution  looks  good  on  paper, 
the  cost  is  simply  too  large.  For  instance,  today’s  operating  systems  consist 
of  tens  of  millions  of  lines  of  code.  Throwing  away  all  of  that  C  code  and 
reimplementing  it  in,  say  Java,  is  simply  too  expensive. 

As  a  step  towards  these  goals,  we  have  been  developing  Cyclone,  a  type- 
safe  programming  language  based  on  C.  The  type  system  of  Cyclone  accepts 
many  C  functions  without  change  and  uses  the  same  data  representations 
and  calling  conventions  as  C  for  a  given  type  constructor.  The  Cyclone 
type  system  also  rejects  many  C  programs  to  ensure  safety.  For  instance,  it 
rejects  programs  that  perform  (potentially)  unsafe  casts,  that  use  unions  of 
incompatible  types,  that  (might)  fail  to  initialize  a  location  before  using  it, 
that  use  certain  forms  of  pointer  arithmetic,  or  that  attempt  to  do  certain 
forms  of  memory  management. 

All  of  the  analyses  used  by  Cyclone  are  local  (i.e,,  intra-procedural) 
so  that  we  can  ensure  scalability  and  separate  compilation.  The  analyses 
have  also  been  carefully  constructed  to  avoid  unsoundness  in  the  presence  of 
threads.  The  price  paid  is  that  programmers  must  sometimes  change  type 
definitions  or  prototypes  of  functions,  and  occasionally  they  must  rewrite 
code. 

We  find  that  programmers  must  touch  about  10%  of  the  code  when 
porting  from  C  to  Cyclone.  Most  of  the  changes  involve  choosing  pointer 
representations  and  only  a  very  few  involve  region  annotations  (around  0.6 
%  of  the  total  changes).  This  past  year,  we  developed  a  semi-automatic  tool 
that  can  be  used  to  automate  most  of  these  changes. 

The  performance  overhead  of  the  dynamic  checks  depends  upon  the  ap¬ 
plication.  For  systems  applications,  such  as  a  simple  web  server,  we  see  no 
overhead  at  all.  This  is  not  surprising,  as  these  applications  tend  to  be  I/O- 
bound.  For  scientific  applications,  we  were  seeing  a  much  larger  overhead 
(around  5x  for  a  naive  port,  and  3x  with  an  experienced  programmer),  due 
to  array  bounds  and  null  pointer  checks.  To  avoid  these,  over  the  past  year 
we  incorporated  a  sophisticated  intra-procedural  analysis  that  eliminates 
most  of  those  checks.  For  instance,  a  simple  matrix-multiply  now  runs  as 
fast  as  C  code,  where  before,  it  was  taking  over  5x  as  long. 
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We  also  introduced  new  typing  mechanisms  that  support  a  wider  range  of 
safe  memory  management  options.  Before,  we  had  to  restrict  programmers 
to  using  only  garbage  collection,  stack  allocation,  or  limited  forms  of  region 
allocation,  all  of  which  could  adversely  affect  time  and  space  requirements. 
This  year,  we  added  support  for  dynamic  region  allocation,  unique  pointers, 
and  reference-counted  objects.  These  mechanisms  let  programmers  control 
memory  management  overheads  without  sacrificing  safety.  For  instance, 
we  were  able  to  improve  the  throughput  of  the  MediaNet  streaming  media 
server  by  up  to  42%  and  decrease  the  memory  requirements  from  SMB  to  a 
few  kilobytes  using  these  new  features. 

Finally,  as  part  of  his  Ph.D.  dissertation,  Daniel  Grossman  designed 
extensions  that  support  type-safe  multi-threading.  These  extensions,  which 
we  plan  to  implement  in  the  next  year,  statically  ensure  the  absence  of  data 
races  in  programs,  thereby  avoiding  another  wide  class  of  security  problems. 


Secure  Program  Partitioning 

We  continue  our  work  in  developing  secure  program  partitioning,  a  novel  way 
to  ensure  that  data  confidentiality  and  integrity  are  preserved  in  distributed 
systems  that  contain  untrusted  hosts  and  mutually  distrusting  principals. 
This  problem  is  particularly  relevant  to  information  systems  used  by  mutu¬ 
ally  distrusting  organizations,  such  as  the  dynamic  coalitions  that  arise  in 
military  settings. 

In  our  approach,  programs  are  automatically  partitioned  into  communi¬ 
cating  subprograms  that  run  on  the  available,  partially  trusted  hosts.  The 
partitioning  automatically  extracts  a  secure  communications  protocol,  so 
that  if  any  host  is  subverted,  then  only  those  principals  that  have  explicitly 
stated  trust  in  that  host  need  fear  a  violation  of  confidentiality.  That  is, 
for  a  given  principal  p,  the  partitioned  program  we  create  is  robust  against 
attacks  on  hosts  not  trusted  by  p.  To  protect  data  integrity,  information 
and  code  are  also  replicated  across  the  available  hosts.  Some  replicas  may 
be  securely  hashed  to  protect  them  against  subversion  of  the  host  on  which 
they  are  executing. 

We  have  implemented  these  techniques  in  Jif/split,  an  extension  to  our 
publicly  released  Jif  compiler  that  statically  enforces  information  flow  con¬ 
trol,  in  conjunction  with  a  distributed  run-time  system  that  securely  exe¬ 
cutes  partitioned  and  replicated  programs  while  guarding  against  subverted 
or  malicious  hosts.  New  protocols  have  been  developed  to  permit  secure 
transfer  of  control  between  one  group  of  host  replicas  and  another.  To  un- 
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derstand  the  practicality  of  our  approach,  secure  distributed  systems  have 
been  implemented  using  Jif/split,  including  various  secure  auction  proto¬ 
cols,  Performance  of  the  system  is  quite  reasonable,  despite  the  fine-grained 
program  partitioning.  We  are  now  investigating  availability  policies  in  this 
framework,  which  should  help  defend  against  denial  of  service  attacks. 

Information  Plow  Semantics.  We  have  also  been  investigating  how  to 
define  and  enforce  information  flow  policies  in  concurrent,  probabilistic,  and 
nondeterministic  systems.  Concurrent  systems  are  naturally  nondetermin- 
istic,  because  a  thread  scheduler  must  decide  when  to  allow  various  threads 
to  execute,  and  this  decision  is  beyond  the  programmer’s  control.  Nonde- 
terminism  is  dangerous,  because  it  allows  covert  communication  between 
threads,  using  timing.  We  have  given  a  new  formal  definition  of  security 
for  concurrent  systems;  this  definition  seems  to  correspond  more  closely  to 
an  intuitive  notion  of  security  than  previous  definitions  do.  Further,  we 
have  defined  an  expressive  core  concurrent  programming  language,  which 
is  equipped  with  a  type  system  for  a  static  analysis  that  ensures  programs 
written  in  this  language  are  secure.  Incorporation  of  this  static  analysis  into 
the  Jif  framework  is  an  obvious  next  step. 

In  related  ongoing  work,  we  are  exploring  expressive  security  conditions 
for  systems  incorporating  probabilistic  and  nondeterministic  computation. 
Our  goal  is  to  bound  information  flows.  Most  information  flow  analyses  are 
useful  only  for  showing  that  there  is  no  information  flow,  but  many  real- 
world  systems  (for  example,  password  checkers)  leak  acceptable  amounts 
of  information.  We  have  developed  an  appropriate  program  semantics  for 
modeling  such  systems  and  are  working  towards  a  logic  that  can  be  used  to 
prove  expressive  assertions  about  bounded  information  flow. 


Avoiding  Malicious  Firmware 

After  power-up,  most  computing  devices  enter  a  boot  phase  in  which  the 
hardware  configuration  is  recognized,  devices  are  initialized,  and  the  oper¬ 
ating  system  is  loaded  and  started.  The  program  that  controls  this  process 
is  called  boot  firmware  and  is  typically  stored  in  ROM  or  other  non-volatile 
memory.  The  recent  trend  is  that  boot  firmware  is  becoming  more  com¬ 
plex,  as  it  gains  additional  responsibilities.  Until  recently,  malicious  boot 
firmware  received  relatively  little  attention.  Several  factors  now  conspire  to 
make  for  a  very  worrisome  form  of  attack: 

•  Boot  firmware  runs  in  a  privileged  mode  on  bare  hardware,  prior  to 
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the  start  of  most  security  services.  Malicious  boot  firmware  could 
cause  harm  in  several  different  ways,  most  notably  by  corrupting  the 
operating  system.  Many  current  security  mechanisms  assume  that  the 
operating  system  can  be  trusted.  Thus,  malicious  boot  firmware  could 
subvert  most  of  the  security  mechanisms  currently  deployed  at  the  OS, 
application,  and  enterprise  levels. 

*  Modern  boot  firmware  often  consists  of  modules  contributed  by  mul¬ 
tiple  vendors,  many  of  whom  might  not  be  visible  to  end  users.  Of¬ 
ten  these  modules  are  boot-time  device  drivers  for  distinct  pieces  of 
hardware.  The  configuration  of  boot  firmware  can  be  quite  volatile 
as  the  hardware  configuration  changes.  Many  devices  support  semi- 
automated  firmware  upgrades,  so  there  are  many  opportunities  to  in¬ 
troduce  malicious  boot  firmware. 

Thus,  we  consider  malicious  boot  firmware  to  be  a  plausible,  practical,  and 
dangerous  form  of  attack.  Exploiting  this  vulnerability  should  be  well  within 
the  means  of  motivated  adversaries  such  as  nation-states  and  criminal  orga¬ 
nizations. 

We  are  focused  on  detection  of  malicious  boot  firmware  within  systems 
based  on  Open  Firmware.  Open  Firmware  is  a  mature  and  widely  used 
standard  for  boot  firmware.  Sun  Microsystems  and  Apple  both  use  boot 
firmware  that  conforms  to  the  standard,  The  most  salient  feature  of  Open 
Firmware  is  that  it  includes  an  interpreter  (or  virtual  machine)  for  fcode,  a 
lightly  compiled  form  of  the  Forth  programming  language. 

Fcode  device  drivers,  supplied  by  a  wide  range  of  relatively  anonymous 
vendors,  pose  a  significant  risk  of  introducing  malicious  code  into  the  boot 
program.  Our  concern  is  with  detection  of  malicious  fcode  using  static 
checks  performed  during  each  boot  cycle.  We  check  potentially  dangerous, 
untrusted  code  each  time,  prior  to  execution. 

Our  safety  policy  is  baked-in;  there  is  no  need  for  the  user  to  specify 
anything.  The  policy  consists  of  three  tiers. 

Tier  1:  Basic  Safety.  Included  in  this  tier  are  type  safety,  memory  safety, 
stack  safety,  and  control-flow  safety.  Type  safety  is  the  requirement  that 
each  storage  location  and  each  computational  result  has  a  well-defined  type 
that  can  be  determined  by  static  analysis  prior  to  running  the  program. 
All  assignments  and  memory  references  must  respect  those  types.  Memory 
safety  is  the  requirement  that  all  memory  accesses  are  to  legal  (i.e.,  allo¬ 
cated)  locations.  Stack  safety  is  the  requirement  that  the  program  obeys 
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an  appropriate  discipline  with  respect  to  its  own  calling  stack.  Control- 
flow  safety  is  the  requirement  that  jump  targets  are  locations  containing 
executable  instructions  within  an  appropriate  subprogram. 

Tier  2:  Device  Encapsulation.  Code  supplied  by  different  vendors, 
typically  device  drivers,  will  be  loaded  into  the  boot  program  and  must 
coexist.  We  require  that  these  programs  respect  each  others’  boundaries, 
and  they  only  interact  through  published  interfaces.  Of  critical  importance 
is  the  requirement  that  each  device  be  operated  solely  by  its  own  device 
driver. 

Tier  3:  Structural  Safety.  Code  supplied  by  vendors  will  interact  with 
Open  Firmware  services  through  an  API  that  prevents  unsafe  calls.  This 
safety  arises  for  three  reasons:  the  API  exposes  only  a  restricted,  safer 
subset  of  functionality,  the  implementation  performs  runtime  checks,  and 
our  verifier  can  further  restrict  the  way  the  API  is  invoked.  What  must  be 
verified  is  that  the  untrusted  program  really  uses  the  API  as  specified  and 
does  not  bypass  it  or  tamper  with  its  implementation. 

For  programs  written  in  high-level  languages,  these  properties  are  clear 
and  often  implicit  in  the  definition  of  the  language.  For  instance,  Java  en¬ 
forces  almost  all  type  correctness  at  compile  time.  Language  features,  such 
as  Java’s  private  modifier,  can  be  used  to  enforce  modularity.  In  such  lan¬ 
guages,  interaction  between  specific  code  modules  is  evident  on  inspection. 
However,  our  verification  is  performed  on  fcode,  a  primitive  language  in 
which  none  of  this  would  be  easy.  Our  verifier  relies  on  the  fact  that  the 
fcode  program  is  the  result  of  compiling  from  a  high  level  language — Java. 
This  special  compiler  produces  particularly  well-structured  and  annotated 
fcode,  in  which  constructs  derived  from  Java  are  readily  recognized. 

Our  prototype  system  consists  of  three  interlinked  elements:  the  Java 
VM-to-fcode  compiler  J2F,  the  BootSafe  verifier,  and  Java  API  for  BootSafe- 
compliant  Open  Firmware  drivers  along  with  runtime  support  for  the  API. 
We  are  building  prototypes  of  these  three  elements.  Vendors  will  write  device 
drivers  in  Java  and  use  our  compiler  to  generate  fcode.  Users  will  trust  our 
verifier  and  our  runtime  support  (both  installed  in  their  boot  platform),  but 
will  not  need  to  trust  the  device  driver  code  received  from  the  vendor. 
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DoD  Interactions  and  Technology  Transitions 

•  As  a  consultant  to  DARPA/IPTO,  Schneider  chairs  the  independent 
evaluation  team  for  the  OASIS  Dem/Val  prototype  project.  This 
project  funds  two  consortia  to  design  a  battlespace  information  system 
intended  to  tolerate  a  class  A  Red  Team  attack  for  12  hours. 
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•  Greg  Morrisett  spent  nine  months  visiting  Microsoft’s  Cambridge  Re¬ 
search  Laboratory,  where  he  worked  with  researchers  on  programming 
language  and  security  technology.  In  particular,  Morrisett  worked  on 
the  development  of  Microsoft’s  tools  for  automatically  finding  secu¬ 
rity  flaws  in  production  code,  based  on  his  experience  with  Cyclone. 
He  also  worked  with  student  Kevin  Hamlen  and  Microsoft  researchers 
on  the  implementation  of  the  .NET  rewriting  tool  for  inline  reference 
monitors. 

•  Further  public  releases  of  Myers’  Jif  compiler  have  been  made  available 
at  the  Jif  web  site,  http://www.cs.cornell.edu/jif.  The  Jif  language 
extends  the  Java  programming  language  with  support  for  information 
flow  control.  The  Jif  compiler  is  implemented  on  top  of  the  Polyglot  ex¬ 
tensible  compiler  framework  for  Java.  The  Polyglot  framework  has  also 
been  released  publicly  at  http://www.cs.cornell.edu/projects/polyglot, 
and  researchers  at  Princeton  University  are  using  this  framework  in 
their  own  research.  The  releases  of  both  Jif  and  Polyglot  are  provided 
as  Java  source  code  and  work  on  Unix  and  Windows  platforms. 

•  AT&T  research  is  working  with  us  to  develop  the  Cyclone  language, 
compiler,  and  tools.  In  addition,  researchers  at  the  University  of 
Maryland,  the  University  of  Utah,  Princeton,  and  the  University  of 
Pennsylvania,  and  Cornell  are  all  using  Cyclone  to  develop  research 
prototypes. 
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